A Compiler-Based Approach to Schema-Specific XML Parsing

نویسندگان

  • Kenneth Chiu
  • Wei Lu
چکیده

The validation of XML instances against a schema is usually performed separately from the parsing of the more basic syntactic aspects of XML. We posit, however, that schema information can be used during parsing to improve performance, using what we call schema-specific parsing. This paper develops a framework for schema-specific parsing centered on an intermediate representation we call generalized automata, which abstracts the computational steps necessary to validate against a schema. The generalized automata can then be used to generate optimized code which might be onerous to write manually. We present results that suggest this is a viable approach to high-performance XML parsing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Compiler-Based Approach to Schema-Specific Parsers for XML

The Extensible Markup Language (XML) has become the de facto standard for interoperable data representation. Its human-readable, general syntax provides wide applicability and ease-of-use. These same characteristics, however, complicate the efficient processing of XML, and have created concerns about the performance of XML for distributed systems such as Web services. XML parsers are generally ...

متن کامل

Constructing Finite State Automata for High-Performance XML Web Services

This paper describes a validating XML parsing method based on deterministic finite state automata (DFA). XML parsing and validation is performed by a schema-specific XML parser that encodes the admissible parsing states as a DFA. This DFA is automatically constructed from the XML schemas of XML messages using a code generator. A twolevel DFA architecture is used to increase efficiency and to re...

متن کامل

Toward Remote Object Coherence with Compiled Object Serialization for Distributed Computing with XML Web Services

Cross-platform object-level coherence in Web services-based distributed systems and grids requires lossless serialization to ensure programming-language specific objects are safely transmitted, manipulated, and stored. However, Web services development tools often suffer from lossy forms of XML serialization, which diminishes the usefulness of XML Web services as a competitive approach to binar...

متن کامل

Data-Binding Facility for the Java Platform

Sun Microsystems has recently undertaken to provide basic support for XML in the Java Platform. The proposed facilities include both an event-driven, SAX-compliant parser and an implementation of the W3C DOM (Document Object Model) parse-tree API. This is a critical first step, but using these fairly low-level APIs does require a moderately sophisticated understanding of XML. In order to make X...

متن کامل

JavaML 2.0: Enriching the Markup Language for Java Source Code

Although the representation of source code in plain text format is convenient for manipulation by programmers, it is not an effective format for processing by software engineering tools at an abstraction level suitable for source code analysis, reverse-engineering, or refactoring. Textual source code files require language-specific parsing to uncover program structure, a task undertaken by all ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003